A lot of effort has been made in Computational Auditory Scene Analysis (CASA) to segregate target speech from\r\nmonaural mixtures. Based on the principle of CASA, this article proposes an improved algorithm for monaural speech\r\nsegregation. To extract the energy feature more accurately, the proposed algorithm improves the threshold selection\r\nfor response energy in initial segmentation stage. Since the resulting mask map often contains broken auditory\r\nelement groups after grouping stage, a smoothing stage is proposed based on morphological image processing.\r\nThrough the combination of erosion and dilation operations, we suppress the intrusions by removing the unwanted\r\nparticles and enhance the segregated speech by complementing the broken auditory elements. Systematic\r\nevaluation shows that the proposed segregation algorithm improves the output signal-to-noise ratio by an average of\r\n8.55 dB and cuts the percentage of noise residue by an average of 25.36% compared with the mixture, yielding a\r\nsignificant improvement for speech segregation.
Loading....